Skip to content

fix(cli): exit via os._exit to dodge native SIGABRT at shutdown#321

Merged
HumanBean17 merged 1 commit into
masterfrom
fix/cli-deterministic-exit
Jun 14, 2026
Merged

fix(cli): exit via os._exit to dodge native SIGABRT at shutdown#321
HumanBean17 merged 1 commit into
masterfrom
fix/cli-deterministic-exit

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Summary

Stops an intermittent CI failure where a one-shot java-codebase-rag subprocess (e.g. erase) crashes with exit code -6 (SIGABRT) during CPython interpreter shutdown:

✓ java-codebase-rag erase · finished in 2.66s
Fatal Python error: PyGILState_Release: thread state 0x… must be current when releasing
Python runtime state: finalizing
Extension modules: tree_sitter_java._binding, numpy.*, pyarrow.*

The command's logic completes successfully ({"success": true}) — the process is killed during finalization.

Root cause

_cmd_erase (java_codebase_rag/cli.py) imports lancedb (→ pyarrow's native thread pool). When the one-shot CLI returns into normal interpreter shutdown (raise SystemExit(main())), a lingering pyarrow/lance worker thread still holds a PyGILState; finalization tears thread states down out from under it → Py_FatalErrorabort()-6. This is a thread-timing race → flaky.

This is a distinct native crash from the kuzu scan SIGSEGV mitigated by #317 — different signal (SIGABRT vs SIGSEGV), different phase (finalization vs scan), different lib (pyarrow/lance vs kuzu).

Why now

It intermittently red-blocks unrelated PRs. It killed the erase step of test_cli_lifecycle_round_trip_init_increment_meta_erase on #320 (which touches only installer.py), while the same test passed on green master #319 90 minutes earlier. The crash lives inside the java-codebase-rag erase subprocess spawned by _run_cli, so #317's per-file pytest isolation doesn't touch it.

Fix

Route the installed java-codebase-rag entry through a thin wrapper that flushes stdout/stderr and calls os._exit(rc), skipping the racy finalization entirely. One-shot CLI processes have already done all real work and emitted their result before shutdown; finalization buys them nothing and is exactly where the race lives.

main() stays return-based so in-process test callers (cli.main([...])) keep working.

def _console_script_main() -> None:
    rc = main()
    sys.stdout.flush()
    sys.stderr.flush()
    os._exit(rc)
 # pyproject.toml
-java-codebase-rag = "java_codebase_rag.cli:main"
+java-codebase-rag = "java_codebase_rag.cli:_console_script_main"

This is a root-cause fix at the mechanism level, not test suppression: the lifecycle round-trip still runs erase → init → increment → meta → erase and asserts exit codes; it just makes the CLI process return its true exit code instead of being murdered by a buggy finalizer.

Verification

  • ruff check . — clean
  • 2 new regression tests (wrapper flush + os._exit(rc) contract with rc 0 and 2; pyproject wiring guard) — RED → GREEN
  • pytest tests/test_java_codebase_rag_cli.py54 passed, incl. the originally-failing test_cli_lifecycle_round_trip_init_increment_meta_erase
  • pytest tests776 passed, 11 skipped
  • Erase subprocess stress loop (×30) — 30× rc=0, 0× rc=-6

Caveat: the crash was Linux-CI-specific (thread-timing race); local verification (macOS) confirms the fix mechanism (every erase now exits 0 via os._exit) and no regressions. CI will be the final on-Linux confirmation.

User-visible changes / reindex / env / ontology

None. No schema, ranking, ontology, env-var, or re-index impact — purely the CLI process's shutdown path.

🤖 Generated with Claude Code

A pyarrow/lance worker thread (loaded via lancedb in lifecycle commands) can
outlive CPython finalization in a one-shot CLI subprocess and trip
PyGILState_Release (SIGABRT, exit -6). It's a thread-timing race — flaky —
and it intermittently red-blocked unrelated PRs: it killed the erase step of
test_cli_lifecycle_round_trip_init_increment_meta_erase on PR #320 (which
touches only installer.py), while the same test passed on green master #319.

Route the installed `java-codebase-rag` entry through _console_script_main,
which flushes stdout/stderr and os._exit(rc) instead of returning into the
racy teardown. main() stays return-based so in-process test callers keep
working.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit faeeb38 into master Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant